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Abstract 

In recent years, many DHT-based P2P systems have 
been proposed, analyzed, and certain deployments have 
reached a global scale with nearly one million nodes. One 
is thus faced with the question of which particular DHT sys- 
tem to choose, and whether some are inherently more ro- 
bust and scalable. Toward developing such a comparative 
framework, we present the reachable component method 
(RCM) for analyzing the performance of different DHT 
routing systems subject to random failures. We apply RCM 
to five DHT systems and obtain analytical expressions that 
characterize their routability as a continuous function of 
system size and node failure probability. An important con- 
sequence is that in the large-network limit, the routability of 
certain DHT systems go to zero for any non-zero probability 
of node failure. These DHT routing algorithms are therefore 
unscalable, while some others, including Kademlia, which 
powers the popular eDonkey P2P system, are found to be 
scalable. 



1 Introduction 

Developing scalable and fault tolerant systems to lever- 
age and utilize the shared resources of distributed comput- 
ers has been an important research topic since the dawn of 
computer networking. In recent years, the popularity and 
wide deployment of peer-to-peer (P2P) systems has inspired 
the development of distributed hash tables (DHTs). DHTs 
typically offer scalable O(logn) routing latency and effi- 
cient lookup interface. According to a recent study [12], 
the DHT based file-sharing network eDonkey is emerging 
as one of the largest P2P systems with millions of users 
and accounting for the largest fraction of P2P traffic, while 
P2P traffic currently accounts for 60% of the total Internet 
bandwidth. Given the transient nature of P2P users, analyz- 
ing and understanding the robustness of DHT routing algo- 



rithms in the asymptotic system size limit under unreliable 
environments become essential. 

In the past few years, there has been a growing number 
of newly proposed DHT routing algorithms. However, in 
the DHT routing literature, there have been few papers that 
provide a general analytical framework to compare across 
the myriad routing algorithms. In this paper, we develop a 
method to analyze the performance and scalability of differ- 
ent DHT routing systems under random failures of nodes. 
We would like to emphasize that we intend to analyze the 
performance of the basic routing geometry and protocol. 
In a real system implementation, there is no doubt that a 
system designer have many optional features, such as addi- 
tional sequential neighbors, to provide improved fault tol- 
erance. Nevertheless, the analysis of the basic routing ge- 
ometry will give us more insights and good guidelines to 
compare among systems. 

In this paper, we investigate the routing performance of 
five DHT systems with uniform node failure probability q. 
Such a failure model, also known as the static resilience 
model', is assumed in the simulation study done by Gum- 
madi et al. [2]. A static failure model is well suited for 
analyzing performance in the shorter time scale. In a DHT, 
very fast detection of faults is generally possible through 
means such as TCP timeouts or keep-alive messages, but 
establishing new connections to replace the faulty nodes is 
more time and resource consuming. The applicability of the 
results derived from this static model to dynamic situations, 
such as churn, is currently under study. 

Intuitively, as the node failure probability q increases, the 
routing performance of the system will worsen. A quantita- 
tive metric, called routability is needed to characterize the 
routing performance of a DHT system under random fail- 
ure: 

Definition 1 The routability of a DHT routing system is 
the expected number of routable pairs of nodes divided by 



'The term static refers to the assumption that a node's routing table 
remains unchanged ai^er accounting for neighbor failures. 
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the expected number of possible pairs among the surviving 
nodes. In other words, it is the fraction of survived routing 
paths in the system. In general, routability is a function of 
the node failure probability q and system size N. 

As the DHT-based eDonkey is reaching global scale, it is 
important to study how DHT systems perform as the num- 
ber of nodes reaches millions or even billions. In fact, we 
know from site percolation theory [15], that if q > (1 ~Pc), 
where pc is called the percolation threshold of the underly- 
ing network, then the network will get fragmented into very 
small-size connected components and for large enough net- 
work size. As a result, the routability of the network will 
approach zero for such failure probability due to the lack 
of connectivity. However, because of how messages get 
routed as specified by the underlying routing protocol, all 
pairs belonging to the same connected component need not 
be reachable under failure. 

In general, the size of the connected components do not 
directly give us the routability of the subnetworks. Hence, 
one needs to develop a framework different from the well- 
known framework of percolation. As a result, this work in- 
vestigates DHT routability under the random failure model 
for both finite system sizes and the infinite limit. We will 
define the scalability of a routing system as follows: 

Definition 2 A DHT routing system is said to be scalable ;/ 
and only if its routability converges to a nonzero value as the 
system size goes to infinity for a nonzero failure probability 
q. Mathematically, it is defined as follows: 

lim r{N, q) > for < q < 1 - pc 

where r{N, q) denotes the routability of the system as a 
function of system size N and failure probability q. Simi- 
larly, the system is said to be unscalable ;/ and only if its 
routability converges to zero as the system size goes to in- 
finity for a nonzero failure probability q: 

lim r{N, q) = for < q < 1 — pc 

We want to emphasize that in a real implementation, there 
are many system parameters that the system designer can 
specify, such as the number of near neighbors or sequential 
neighbors. As a result, the designer can always add enough 
sequential neighbors to achieve an acceptable routability 
under reasonable node failure probability for a maximum 
network size that exceeds the expected number of nodes that 
will participate in the system. The scalability definition is 
provided for examining the theoretical asymptotic behavior 
of DHT routing systems, not for claiming a DHT system is 
unsuitable for any large-scale deployment. 

Having specified the definition of the key metrics, we 
will present the reachable component method (RCM), a 



simple yet effective method for analyzing DHT routing per- 
formance under random failure. We apply the RCM method 
to analyze the basic routing algorithms used in the following 
five DHT systems: Symphony [10], Kademlia [11], Chord 
[16], CAN [14] and Plaxton routing based systems [13]. For 
all algorithms except Chord routing, we derive the analyt- 
ical expression for each algorithm's routability under ran- 
dom failure, while an analytical expression for a tight lower 
bound is obtained for Chord routing. In fact, our analyti- 
cal results match the simulation results carried out in [2], 
where different DHT systems were simulated and the per- 
centage of failed paths (i.e., 1 -routability) was estimated for 
N — 2^^, as illustrated in Fig.|6l In addition, we also derive 
the asymptotic performance of the routing algorithm under 
failure as the system scales. 

One interesting finding of this paper is that under ran- 
dom failure, the basic DHT routing systems can be classi- 
fied into two classes: scalable and unscalable. For example, 
the XOR routing scheme of Kademlia is found to be scal- 
able, since the routability of the system under nonzero prob- 
ability of failure converges very fast to a positive limit even 
as the size of the system tends to infinity. This is consistent 
with the observation that the Kademlia-based popular P2P 
network eDonkey is able to scale to millions of nodes. In 
contrast, as the system scales, the routability of Symphony's 
routing scheme is found to quickly converge to zero for any 
failure probability greater than zero. Thus, the basic routing 
system for Symphony is found to be unscalable. However, 
as briefly discussed above in this section, a system designer 
for Symphony can specify enough near neighbors to guar- 
antee an acceptable routability in the system for a maximum 
network size and a reasonable failure probability q. 

The rest of this paper is organized as follows. In section 
12] we discuss previous work on the fault tolerance of P2P 
routing systems. In section |3] we will give an overview of 
the DHT routing systems that we intend to analyze. In sec- 
tion|4] we present the reachable component method (RCM) 
and apply the RCM method on several DHT systems. In 
section |5] we examine the scalability of DHT routing sys- 
tems. In section|6l we give our concluding remarks. 

2 Related Work 

The study of robustness in routing networks has grown in 
the past few years with researchers simulating failure con- 
ditions in DHT-based systems. Gummadi et al. [2] showed 
through simulation results that the routing geometry of each 
system has a large effect on the network's static resilience 
to random failures. In addition, there have been research 
work done in the area of analyzing and simulating dynamic 
failure conditions (i.e. churn) in DHT systems [5,7,8]. 

Theory work has been done to predict the performance 
of DHT systems under a static failure model. The two 
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main approaches thus far have been graph theoretic meth- 
ods [1,6,9] and Markov processes [17]. Most analytical 
work to date has dealt with one or two routing algorithms to 
which their respective methods are well-suited but have not 
provided comparisons across a large fraction of the DHT 
algorithms. Angel et al. [1] use percolation theory to place 
tight bounds on the critical failure probability that can sup- 
port efficient routing on both hypercube and rf-dimensional 
mesh topologies. By efficient they mean that it is possible 
to route between two nodes with time complexity on the 
order of the network diameter. While this method predicts 
the point at which the network becomes virtually unusable, 
it does not allow the detailed characterization of routabil- 
ity as a function of the failure probability. In contrast, the 
reachable component method (RCM) method exploits the 
geometries of DHT routing networks and leads to simple 
analytical results that predict routing performance for arbi- 
trary network sizes and failure probabilities. 

3 Overview of DHT Routing Protocols 

We will first review the five DHT routing algorithms that 
we intend to analyze. An excellent discussion of the geo- 
metric interpretation of these routing algorithms (except for 
Symphony) is provided by Gummadi et al. [2] and we use 
the same terms for the geometric interpretations of DHT 
routing systems in this paper (e.g. hypercube and ring ge- 
ometry for CAN and Chord routing systems, respectively). 
By following the algorithm descriptions in [2] as well as the 
descriptions in this section, one can construct Markov chain 
models (e.g. Fig. |4} for the DHT routing algorithms. The 
application of the Markov chain models will be discussed 
in section l4Tn and l4!2l 

In addition, we will use the notation of phases as used 
in [3]: we say that the routing process has reached phase j if 
the numeric distance (used in Chord and Symphony) or the 
XOR distance (used in Kademlia) from the current message 
holder to the target is between 2-' and 2^+^. In addition, 
we will use binary strings as identifiers although any other 
base besides 2 can be used. Finally, for those systems that 
require resolving node identifier bits in order, we use the 
convention of correcting bits from left to right. 

3.1 Tree (Plaxton) 

Each node in a tree-based routing geometry has log N 
neighbors, with the ith neighbor matching the first i~l bits 
and differ on the ith bit. When a source node S, wishes 
to route to a destination, D, the routing can only be suc- 
cessful if one of the neighbors of S , denoted Z, shares a 
prefix with D and has the highest-order differing bit. Each 
successful step in the routing results in the highest-order bit 
being corrected until no bits differ. 



The routing Markov chain (Fig. |4(a)t for the tree geome- 
try can easily be generated by examining the possible failure 
conditions during routing. At each step in the routing pro- 
cess, the neighbor that will correct the leftmost bit must be 
present in order for the message to be routed. Otherwise, 
the message is dropped and routing fails. 

3.2 Hypercube (CAN) 

In the hypercube geometry, each node's identifier is a 
binary string representing its position in the d-dimensional 
space. The distance between nodes is simply the Hamming 
distance of the two addresses. The number of possible paths 
that can correct a bit is reduced by 1 with each successful 
step in the route. This fact makes the creation of the hyper- 
cube routing Markov chain (Fig. |4(b)t straightforward. 

3.3 XOR (Kademlia) 

In XOR routing [11], the distance between two nodes is 
the numeric value of the XOR of their node identifiers. Each 
node keeps log(A^) connections, with the ith neighbor cho- 
sen uniformly at random from an XOR distance in the range 
of [2''^*, 2^*^*+^] away. Messages are deUvered by routing 
greedily in the XOR distance at each hop. Moreover, it is a 
simple exercise to show that choosing a neighbor at an XOR 
distance of [2''^*, 2''^*+^] away is equivalent to choosing a 
neighbor by matching the first (i-1) bits of one's identifier, 
flipping the ith bit, and choose random bits for the rest of 
the bits. 

Effectively, this construction is equivalent to the Plaxton- 
tree routing geometry. As a result, when there is no failures, 
the XOR routing protocol resolves node identifier bits from 
left to right as in the Plaxton-tree geometry. However, when 
the system experiences node failures, nodes have the option 
to route messages to neighbors that resolve lower order bits 
when the neighbor that would resolve the highest order bit is 
not available. Note that resolving lower order bits will also 
make progress in terms of decreasing the XOR distance to 
destination. Nonetheless, the progress made by resolving 
lower order bits is not necessarily preserved in future hops 
or phases (see Fig. \5(a)) . 

For example, at the start of the routing process, one phase 
is advanced if the neighbor correcting the leftmost bit exists. 
Otherwise, the routing process can correct one of the lower 
order bits. However, if all of the neighbors that would re- 
solve bits have failed, the routing process fails. A Markov 
chain model for the routing process is illustrated in Fig. 
|5(b)| 

3.4 Ring (Chord) 

In Chord [16], nodes are placed in numerical order 
around a ring. Each node with identifier a maintains 
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log( A'^) connections or fingers, with each finger at a distance 
[2"^"*, 2"^^*+^] away (the randomized version of Chord is 
discussed here). Routing can be done greedily on the ring. 
When the system experiences failure, each node will con- 
tinue to route a message to the neighbor closest to destina- 
tion (i.e. in a greedy manner). A Markov chain model for 
the routing process is illustrated in Fig. |8(a)| 

3.5 Small- World (Symphony) 

Small-world routing networks in the 1-dimensional case 
have a ring-like address space where each node is connected 
to a constant number of its nearest neighbors and a constant 
number of shortcuts that have a l/d distance distribution (d 
is the ring-distance between the end-points of the shortcut). 
Each node maintains a constant number of neighbors and 
uses greedy routing. Due to the distance distribution it will 
take an average of 0(log N) hops before routing halves the 
distance to a target node, therefore requiring 0(log N) such 
phases to reach a target node for a total expected latency of 
Oilog^N). 

When the system experiences node failures, some of the 
shortcuts will be unavailable and the route will have to take 
"suboptimal" hops. The small-world Markov chain model 
is fundamentally different from the ones for XOR routing 
(Fig. \5(h)) and ring routing (Fig. |8(a)t . A routing phase 
is completed if any of the node's shortcuts connects to the 
desired phase. This happens with probability ^ where kg 
denotes the number of shortcuts that each node maintains. 
Alternatively, the routing fails if all of the node's near neigh- 
bor and shortcut connections fail, which happens with prob- 
ability (jf'''"+'^'. If neither of the above happens then the 
route takes a suboptimal hop, which happens with probabil- 
ity 1 - - 

4 Reachable Component Method and its Ap- 
pUcations 

4.1 Method Description 

We now describe the steps of the reachable component 
method (RCM) in calculating the routability of a DHT rout- 
ing system under random failure. Before we delve into the 
description, let us first clarify several concepts and nota- 
tions on DHT routing. First, we allow all DHTs to fully 
populate their identifier spaces (i.e. node identifier length 
d = logf, N). Second, when a DHT is not in its perfect 
topological state, it can be the case that a pair of nodes are in 
the same connected component but these two nodes cannot 
route between each other Thus, the reachable component of 
node i is the set of nodes that node i can route to under the 
given routing algorithm. Note that the reachable component 



of node Hs a subset of the connected component contain- 
ing node i. Third, we assume that no "back-tracking" is 
allowed (i.e. when a node cannot forward a message fur- 
ther, the node is not allowed to return the message back to 
the node from whom the message was received). 

RCM is fairly simple in concept and involves the follow- 
ing five steps: 

1. Pick a random node, node i, from the system and de- 
note it as the root node. Construct the root node's rout- 
ing topology from the routing algorithm of the system 
(i.e. the topology by which the root node routes to all 
other nodes in the system). 

2. Obtain the distribution of the distances (in hops or in 
phases) between the root node and all other nodes (de- 
noted as n{h)); in other words, for each integer h, cal- 
culate the number of nodes at distance h hops from the 
root node. Note that the meaning of hops or phases 
will be clear from the context. 

3. Compute the probability of success, q), for rout- 
ing to a node h hops away from the root node under a 
uniform node failure probability, q. 

4. Compute the expected size of the reachable component 
from the root node by first calculating the expected 
number of reachable nodes at distance h hops away 
(which is simply given by n{h) * q)). Now, we 
sum over all possible number of hops to obtain the ex- 
pected size of the reachable component. 

5. By inspection, the expected number of routable pairs 
in the system is given by summing all surviving nodes' 
expected reachable component sizes. Then, dividing 
the expected number of routable pairs by the number 
of possible node pairs among all surviving nodes pro- 
duces the routability of the system under uniform node 
failure probability q. 

The formula for computing the expected size of the 
reachable component, i?[S'i], described in step 4 is derived 
as follows: 

N N d 

J = l i=l h=l 

where Yj is Bernoulli random variable for denoting reach- 
ing node j, and d is the node identifier length. 

Since nodes in the system are removed with probability 
q, there are (1 — q)N or pN nodes that survive on average. 
In step 5, the formula for calculating the routability, r, of 
the system under uniform failure probability q is given as 



4 



oil 



010 



000 



110 




111 



101 



001 

Figure 1. Here we 
illustrate the reachable 
component method us- 
ing an 8-nodes hyper- 
cube. 
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Figure 2. We select node 01 1 
to be the root of the routing 
graph. The symmetry of the 
system means that each node 
will be the root of a routing 
graph with identical structure. 
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Figure 3. For illustration purpose, we examine how 
Oil routes a message to 100. Note that three choices 
exist for the first hop, 2 choices exist for the second hop 
and only one choice left for the last hop. For this exam- 
pie, p{h, q) is: p(3, q) = (1 - q')(l -q^)(l^q). 
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where Mrp denotes the expected number of routable pairs 
among surviving nodes, and Mp is the expected number of 
all possible pairs among surviving nodes. Note that the last 
equality follows from the observation that DHTs investi- 
gated in this paper have symmetric nodes. Therefore, the 
routing topology of each node is statistically identical to 
each other Thus, all Si's are identically distributed for all 
i's: E[S] = E[S^] Vi. 

4.2 Using the Hypercube Geometry as an 
Example 

A simple application of the RCM method is illustrated 
for the CAN hypercubic routing system in Fig. 1113! The 
RCM steps involved are as follows: 

Step 1. As reviewed in section |3] in a hypercube routing 
geometry [14], the distance (in hops) between two nodes is 
their Hamming distance. Routing is greedy by correcting 
bits in any order for each hop. 

Step 2. Thus, for any random node i in a hypercube routing 
system with identifier length of d bits, we have the follow- 
ing distance distribution: n{h) = (^). The justification is 
immediate: a node at h hops away has a Hamming distance 
of h bits with node i. Since there are (^) ways to place the 
h differing bits, there are (^) nodes at distance h (see Fig. 



Step 3. The routing process can be modeled as a discrete 
time Markov chain (Fig. |3] and |4(b)) . The states S'^'s of 
the Markov chain correspond to the number of corrected 
bits. Note that there are only two absorbing states in the 
Markov chain: the failure state F and the success state (i.e. 
Sh)- Thus, the probability of successfully routing to a target 
node at distance h hops away is given by the probability of 
transitioning from 5*0 to Sh in the Markov chain model: 

p{h,q) = Pr(5o ^ 5i ^ ... ^ 5,,) 

= Pr(5o ^ 5i)Pr(5i ^ 52)...Pr(5„_i ^ Sh) 

h 

m— 1 

Step 4. Thus, the expected size of the reachable component 
is given as: 



d d / j\ h 

E[S] = Y.<h)P^h^l) = ll\h) 11(1-'?") 

h=l h=l ^ ^ m=l 

Step 5. Using Eq. ^ we obtain the analytical expression for 
routability: 



^nlh)p{h,q) 



h=l 



(1 - q)2d - 1 
h=l ^ ^ m=l 

{l-q)2d-l 



(3) 



(4) 
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(a) Tree 



(b) Hypercube 



Figure 4. The above diagrams illustrate the Markov chain model of the routing process to a target at distance h hops from the 
root node. Note that there are only two absorbing states in these Markov chains: the failure state (denoted by F) and the success 
state (denoted by Sh)- (a) Markov chain model for tree routing: The S'iS represent the states that correspond to number of corrected 
ordered bits. At each Si, the neighbor that will correct the leftmost bit must be present in order for the message to be routed. 
Otherwise, the message is dropped and routing fails. Thus, the transition probability from 5*; to 5*;+! is 1 — g, while the transition 
probabilities to the failure state is q. (b) Markov chain model for hypercube routing: Here, the Sis represent the states that correspond 
to number of corrected bits in any order. The transition probabilities are obtained by noting that at state 5*;, there are h — i neighbors 
to route the message to. 



4.3 Summary of Results for other Routing 
Geometries 



Using the RCM method, the analytical expressions for 
the other DHT routing geometries can be similarly derived 
as for the hypercube routing geometry. In all the deriva- 
tions, the majority of the work involves finding the expres- 
sion for p{h, q) through Markov chain modeling. Note that 
the analytical expressions derived in this section are com- 
pared with the simulation results obtained by Gummadi et 
al. [2] inFig.|6(a)land|6(b)l 

For ease of exposition, we will use the notation G{i, j), 
which denotes the probability that, starting at state i, 
the Markov chain ever visits state j. By any of the 
Markov chain models for the routing protocols, we note that 
G{So, 5i) = 1 - Qih), G(5i, ^2) = 1 - Qih - 1), and 
so forth, where the function Q{m) can be thought of as the 
probability of failure at the mth phase of the routing pro- 
cess. As a result, all of the DHT systems under study have 
the property that the probability of successfully traveling h 
hops or phases from the root node, p{h, q), is given by the 



following common form: 

p{h,q) = G{So, Si)G{Si, S2).-.G{Sh~^i, Sh) 

h 

= n (1 - 

m— 1 

Using Eq. |3] we see that only the expressions for n{h) and 
Q{m) are needed to compute the routability of the DHT 
routing system under investigation. As a result, we will only 
provide the n{h) and Q{m) expressions for each system for 
conciseness. 



4.3.1 Tree 

For the tree routing geometry, the routing distance distribu- 
tion, n{h), is (J^) by inspection. Furthermore, it is sim- 
ple to show that p{h, q) = (1 ^ <z)'' by examining the 
Markov chain model (see Fig. |4(a)^ . In sum, the ex- 
pression for routability can be succinctly given as follows: 

_ (2-g)''-l 
' ~ (l-g)2'i-l 
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(a) XOR Routing under Failure (b) XOR Markov Chain Model 

Figure 5. (a) illustration of XOR routing under failure: in this simple example, node 010 tries to route a message to node 101. 
However, its first neighbor 111 (i.e. the randomly chosen node that flips the first bit and chooses random bits for the rest of the 
identifier bits), has just failed. As a result, the message is routed to node OlO's second neighbor, node 000, correcting a lower 
order bit. Now, node OOO's first neighbor, node 110, is available, and node llO's second neighbor, node 100, is also available. 
Consequently, the message is routed to the destination node 101, by following the dashed arrows in the diagram, (b) Markov chain 
model for XOR routing: this diagram illustrates routing to a target located at h phases in distance, which is equivalent to correcting 
h bits in order (left to right). The S'iS denote the states that correspond to the number of corrected ordered bits, which is equivalent 
to the number of phases advanced. The states (i, j) denote a state that corresponds to j suboptimal hops taken after advancing i 
phases. 



4.3.2 XOR 

As reviewed in section |3j connecting to a neighbor at an 
XOR distance of [2"^"*, 2'*"'+^] is equivalent to choosing a 
neighbor by matching the first (i-1) bits of one's identifier, 
flipping the ith bit, and choose random bits for the rest of 
the bits. Note that this is equivalent to how neighbors are 
chosen in the Plaxton-tree routing geometry. As a result, 
the n{h) expression is given as: n{h) — (^) just as in the 
tree case. 

Now, let's examine how the Markov chain model (Fig. 
|5(b)t is obtained: in this scenario, a message is to be routed 
to a destination h phases away; starting at state 5o, state Si 
is reached if the optimal neighbor coiTecting the leftmost 
bit exists, which happens with probability \ — q {Si denotes 
the state that corresponds to the zth advanced phase). How- 
ever, if all h neighbors have failed (i.e. with probability 
q^), the failure state F is entered. Otherwise, the rout- 
ing process can correct one of the lower order bits, which 
happens with probability 17(1 — g''^^). Note that there is 
a maximum number of /i — 1 lower order bits that can be 
corrected in the first phase. All other transition probabil- 
ities can be obtained similarly. By inspecting the Markov 



chain model, we note that G{Sq^Si) = 1 — Qxor{h), 
G{Si,S2) = 1 — Qxor{h — 1), and so forth, where the 
function Qxor{m) is defined as follows: 

m— 1 m— 1 

Qxor{m) = n (1-9')] (6) 

A:— 1 j—7n — k 

« g'"(m+-^(g'"-i(m-l)-— )) 

l-q 1-q 

The approximation is obtained by invoking the following: 

1 — X K e^^ for X small. 

4.3.3 Ring 

In ring routing as implemented in Chord, when a node takes 
a suboptimal hop in the routing process, the progress made 
by taking this suboptimal hop is preserved in later hops. 
For example, consider the scenario that a message is to be 
routed to a node at a numeric distance that is 0{N) (i.e. the 
message is to be routed one full circle around the ring), and 
the fingers are connected to nodes that are half way across 
the ring, one quarter across the ring, etc. For the message's 
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Figure 6. Both plots show the percentage of failed paths (i.e. 1-routability) for varying node failure probability and system size 
of N=2^®. (a) Analysis vs simulation: The simulation data points are reproduced from [2]. For all three routing geometries, the 
analytical curves show a great fit to the simulation curves, (b) Analysis vs simulation (ring): For the ring routing algorithm, the 
discrepancy between the analytical and simulation curve is due to the algorithm's property that suboptimal hops contribute non- 
trivially to the routing process. In effect, the analytical curve provides an upper bound for percentage of failed paths. Note that the 
analytical curve is very close to simulation in the region of practical interest (i.e. for failure probability less than 20%) 
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FiQUrS 7. (a) Asymptotic limit: Tliis figure plots the percentage of failed paths (i.e. l-routabi!ity) for varying node failure probability in the 
asymptotic limit. The curves are obtained by evaluating the analytical expressions at N=2^'"'. Note that the curves for tree and symphony are very 
close to a step function, which is consistent with our analysis. In addition, the curves for the other three geometries are very close to the case for 
N=2^'^. (b) Routability vs N: This plot shows the routability of the routing geometries for varying system size and a constant failure probability 
(q=0. 1). This figure cleai'ly demonstrates the lack of scalability of the tree and Symphony routing geometries. As the system scales, the routability 
of both the tree and Symphony routing systems monotonically degrades toward zero. In contrast, the other three geometries remain highly routable 
in the face of failure even as the systems scale to billions of nodes. (In both plots, we set the number of near neighbors and the number of shortcuts 
equal to one for Symphony.) 
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first hop, it takes a suboptimal hop which takes the mes- 
sage only one quarter across the ring, because the finger 
that would have taken the message half way across the ring 
has failed. Then, for the message's second hop, none of the 
finger connections has failed. Thus, the message takes an 
optimal hop which takes the message half way across the 
ring. Therefore, after two hops, the message is now three 
quarters of the way across the ring. Note that the progress 
made in the first suboptimal hop is this scenario is later pre- 
served by a subsequent hop. 

This property that suboptimal hops in ring routing con- 
tribute non-trivially to the routing process is not accounted 
for in the the Markov chain model as illustrated in Fig. |8(a)| 
The reason is that accounting for progress made by subopti- 
mal hops would lead to an exponential blowup in the num- 
ber of terms that we need to keep track of for computing 
p{h,q). This simplified Markov chain model essentially 
makes the assumption that progress made by suboptimal 
hops do not contribute to the routing process. Therefore, 
the analytical expression for p{h, q) using this model pro- 
vides a lower bound. 

The Markov chain model for ring rou ting [ 8(a)| is very 
similar to the one for XOR routing (Fig. |5(b)) . However, 
fundamental differences exist: first, when a suboptimal hop 
is taken in Chord, the number of next hop choices does not 
decrease. For example, in the first phase, there are h choices 
for the next hop, thus the transition probabilities from the 
states in the first phase to the failure state are given by q'^. 
In contrast, the corresponding transition probabilities in Fig. 
|5(b)| are given by g'\ q'^^^, and so forth. In addition, the 
maximum number of suboptimal hops in Chord is given by 
2?i-i^ 2'i-2 ^jjj forth, while the corresponding transition 
probabilities in Fig. |5(b)| are given hy h, h — 1, and so forth. 
This difference is due to the fact that in XOR routing, rout- 
ing fails if all the lower order bits are resolved and the left- 
most bit is not yet resolved. However, Chord does not have 
such restriction. The results for the ring routing geometry is 
derived by inspecting Fig. |8(a)| 

Qr^ng{m)=q"^ ^ [q{l - q"-^-')]' 

fc=0 

In addition, one can easily see by inspection that the n{h) 
expression for the ring geometry is given by: n{h) ~ 2^^^. 

4.3.4 Symphony 

Symphony's Markov chain model (Fig. |8(b)^ is fundamen- 
tally different from the ones for XOR routing (Fig. \5{h)) 
and ring routing (Fig. |8(a)t . Starting at 5*0, one phase is 



advanced if any of the node's shortcuts connects to the de- 
sired phase, which happens with probability ^ where ks 
denotes the number of shortcuts. Alternatively, the routing 
fails if all of the node's near neighbor and shortcut connec- 
tions fail, which happens with probability . The third 
possibility is taking a suboptimal hop, which happens with 
probability 1 — — qkn+ks^ ^jj other transition probabili- 
ties in the Markov chain can be similarly derived. Note that 
we approximate the maximum number of suboptimal hops 

by Tt^I- 

For the Symphony routing geometry, we note that the 
expression for the Q's is constant for all phases. The results 
are similarly derived as the other systems by inspecting Fig. 
|8(b)l 

The symbols fc„ and ks denote the number of near neigh- 
bors and shortcuts respectively. Similarly to ring routing, 
the n{h) expression for the Symphony routing algorithm is 
given by: n{h) — 2^^^. 

5 Scalability of DHT Routing Protocols un- 
der Random Failure 

For a DHT routing system to be scalable, its routabil- 
ity must converge to a non-zero value as the system size 
goes to infinity (Definition |2j. Alternatively, we examine 
the asymptotic behavior of p{h, q) with h set to the aver- 
age routing distance in the system (i.e. h ~ 0{logN) or 
0(log^ N) for Symphony). Using Eq. |3l it is simple to 
show that the equivalent condition for scalability is as fol- 
lows: 

lim p{h, q) = lim p{h, q) > for < q < 1 - pc (8) 

— >oo h — >oo 

Otherwise, the routing system is unscalable. In other words, 
the equivalent condition for system scalability states that as 
the number of routing hops to reach a destination node in the 
system approaches infinity, the probability of successfully 
routing to the destination node must not drop to zero for a 
non-zero node failure probability in the system. 

As discussed in section |431 all of the DHT systems un- 
der study have the property that the probability of success- 
fully traveling h hops or phases from the root node is given 
by the following form: 

h 

P{h,q)^ l[{l-Q{m)) (9) 

m — 1 
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Figure 8. The above two diagrams illustrate the Markov chain model for ring and Symphony routing geometries. 



where Q{m) can be thought of as the probability of failure 
at the mth phase of the routing process. 

Theorem 1 fFrom Knopp [4]) If, for every n, < a„ < 1, 
then the product J^(l — a„) tends to a limit greater than 
if and only if ^ a„ converges. 

Theorem [fallows us to conveniently convert our prob- 
lem of determining the convergence of an infinite product 
to a simpler infinite sum. Thus, p{h, q) is convergent if and 
only if J2 Qifn) converges. 

5.1 Tree 

The case for the tree routing geometry can be trivially 
shown to be unscalable: 



lim (1 — q) = for any q > 

h — >oo 



(10) 



5.2 Hypercube 

For hypercube routing, p{h, q) is given by p{h, q) — 

h 

Y[ (1 — <Z™) (Eq. 13. By invoking Theorem^ it is triv- 

m— 1 

ial to see that J2 9™ converges for < g < 1 — Pc- Thus, 
the hypercube routing geometry is scalable. 

5.3 XOR 

In XOR routing, the Q{m) expression given by Eq. |6l 
It is simple to show that the Q{m) series involves only (7™ 
and mq"^ terms. Thus, J2 Qi^i) is convergent and the XOR 
routing scheme is scalable. 



5.4 Ring 

We will demonstrate that the ring routing geometry is 
also scalable by showing that the XOR results derived above 
is a lower bound for the ring geometry. We compare the 
Markov chain models for the ring geometry and the XOR 
geometry (Fig. |8(a)| and Fig. |5(b)| l. We note that the transi- 
tion probabilities for the suboptimal hops in ring are strictly 
greater than the coiTesponding probabilities for XOR. For 
example, in Fig. |8(a)| note that the transition probabilities 
for Sq -> (0, 1), (0, 1) (0, 2) and so forth are given by 
q{l — q^^^ ). These probabilities are strictly greater than the 



corresponding transition probabilities in Fig. |5(b)| Thus, by 
comparing these two Markov chain models, it is simple to 
show that the q) expression for the ring routing geom- 
etry is strictly greater than the q) expression for XOR 
routing. Thus, the ring routing geometry is also scalable. 

5.5 Symphony 

In Symphony routing, the Q{m) expression given by 
Eq. Note that the Q{m) expression is given by a constant 
term. Therefore, ^ Q{m) is divergent and the Symphony 
routing scheme is unscalable. 



Please refer to Fig. |7(a)| and |7(b)| for plots of the 
above scalability results. 

6 Concluding Remarks 

In this work, we present the reachable component 
method (RCM) which is an analytical framework for char- 
acterizing DHT system performance under random failures. 
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The method's efficacy is demonstrated through an analysis 
of five important existing DHT systems and the good agree- 
ment of the RCM predictions for each system with simu- 
lation results from the literature. Researchers involved in 
P2P system design and implementation can use the method 
to assess the performance of proposed architectures and to 
choose robust routing algorithms for application develop- 
ment. In addition, although the analysis presented in this 
work assumes fully-populated identifier spaces, analytical 
results for real world DHTs with non-fully-populated iden- 
tifier spaces can be similarly derived. Detail investigation 
in this area will be left for future work. 

One of the most interesting implications of this analy- 
sis is that in the large-network Umit, some DHT routing 
systems are incapable of routing to a constant fraction of 
the network if there is any non-zero probability of random 
node failure. These DHT algorithms are therefore consid- 
ered to be unscalable. Other algorithms are more robust to 
random node failures, allowing each node to route to a con- 
stant fraction of the network even as the system size goes to 
infinity. These systems are considered to be scalable. Now 
that real DHT implementations have on the order of mil- 
hons of highly transient nodes, it is increasingly important 
to characterize how the size and failure conditions of a DHT 
will affect its routing performance. 
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